Skip to content

Conversation

@oowekyala
Copy link
Contributor

Note: this is a reopening of #118987 which I inadvertently closed.


Improve the affine scalar replacement pass to identify memref accesses that are used as a reduction variable, and turn them into iter_args variables. For instance in:

%x = memref.alloc(): memref<10x10xf32>
%min = memref.alloc(): memref<10xf32>
// initialize %min
affine.for %i = 0 to 10 {
   affine.for %j = 0 to 10 {
      %0 = memref.load %min[%i]: memref<10xf32>
      %1 = memref.load %x[%i, %j]: memref<10x10xf32>
      %2 = arith.minimumf %0, %1: f32
      memref.store %2, %min[%i] : memref<10xf32>
   }
}

the load/store pattern on %min in the inner loop is characteristic of a reduction. The memory location %min[%i] is invariant on the inner loop induction var, so it is basically used as a scalar. We can rewrite this loop to the following:

%x = memref.alloc(): memref<10x10xf32>
%min = memref.alloc(): memref<10xf32>
// initialize %min
affine.for %i = 0 to 10 {
  %0 = memref.load %min[%i]: memref<10xf32>
  %1 = affine.for %j = 0 to 10 iter_args(%acc = %0) -> f32 {
    %2 = memref.load %x[%i, %j]: memref<10x10xf32>
    %3 = arith.minimumf %acc, %2: f32
    affine.yield %3 : f32
  }
  memref.store %1, %min[%i] : memref<10xf32>
}

where this memory location is "scalarized" as an iter_args variable. This allows existing affine passes to apply more optimizations on the reduction loop, eg, it can be vectorized, or it can be turned into an affine.parallel loop with a combiner for the reduction.

This kind of code pattern is often found in the affine loops generated from linalg code, so I think it's very useful to include this.

I expect maybe some backlash over why I put this into the scalar replacement pass instead of a new pass. I think this is justfied because

  1. This transformation moves some loads and stores out of the loop, and these may be forwardable by the existing scalar replacement transformations. Conversely maybe forwarding some loads and stores frees up some dependencies that make this new loop rewriting pattern applicable. So to me those transformation are tightly related, and maybe they should even be put into a fixed-point loop within the scalrep pass.
  2. This transformation effectively replaces buffer accesses by a scalar iter_args variable. So even if it seems unrelated to the load-store forwardings that the pass is currently doing, I think it still fits within the scope of --affine-scalrep.

Thanks for reading!

Restrict isValidDim to induction vars, and not iter_args
…tomaticAllocationScope

This doesnt affect their behavior when they are just
called from the command line, since these switches
will target the ops nested within the module (which
before where func.func, now can be others).

Fix affine tests

Fix anchor for some passes

TODO better fix would be to improve the PassManager
to support passes that can run on an arbitrary nesting
level. Then these passes could continue to target the
root op properly. The fix with Nesting::ImplicitAny is
really bad because it changes the anchor of OperationPass<>
passes (any anchor) to target children of the root instead
of the root.

Fix pass manager nesting again
And canonicalize a single-element memref copy
into a load and a store. This allows SROA to
scalarize the copied elements. Currently
SROA cannot handle forwarding of one memory slot
to another.
causing relinking of many libs when
a change to Pass.cpp is made
It had a bug whereby the regions of an
op are marked as dead code, and their
successors are not properly populated,
if any operand of a RegionBranchOpInterface
is not folded to a constant. The interface
already is meant to support partially-folded
operands, so we should use that instead of giving up.

Some changes are also just to make debug prints
more helpful. Some of them were printing pointer
addresses.
The problem is that the live in
set is not necessarily already
populated when we ask for the
start op
in the case where the subview op uses
an index of an affine loop as eg, the
offset. The previous implementation
always generated symbols, and the
verifier failed, although the
transformation is valid if you just
generate dims for the variables that
are not symbols, but are valid dims.
and not just plain dims and symbols. This makes it possible
to fold memref.subview ops that use an affine expression of
valid symbol and dims as an offset, even if that expression
is computed by arith ops like muli and addi.
@oowekyala oowekyala closed this Apr 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant